Machine Learning on Statistical Manifold
نویسندگان
چکیده
This senior thesis project explores and generalizes some fundamental machine learning algorithms from the Euclidean space to the statisticalmanifold, an abstract space in which each point is a probability distribution. In this thesis, we adapt the optimal separating hyperplane, the k-means clusteringmethod, and the hierarchical clustering method for classifying and clustering probability distributions. In these modifications, we use the statistical distances as a measure of the dissimilarity between objects. We describe a situation where the clustering of probability distributions is needed and useful. We presentmany interesting and promising empirical clustering results, which demonstrate the statistical-distance-based clustering algorithms often outperform the same algorithms with the Euclidean distance in many complex scenarios. In particular, we apply our statistical-distance-based hierarchical and k-means clustering algorithms to the univariate normal distributions with k 2 and k 3 clusters, the bivariate normal distributions with diagonal covariance matrix and k 3 clusters, and the discrete Poisson distributions with k 3 clusters. Finally, we prove the k-means clustering algorithm applied on the discrete distributions with the Hellinger distance converges not only to the partial optimal solution but also to the local minimum.
منابع مشابه
Information Geometric Density Estimation
We investigate kernel density estimation where the kernel function varies from point to point. Density estimation in the input space means to find a set of coordinates on a statistical manifold. This novel perspective helps to combine efforts from information geometry and machine learning to spawn a family of density estimators. We present example models with simulations. We discuss the princip...
متن کاملDissimilarity Data in Statistical Model Building and Machine Learning
We explore three papers concerned with two methods for incorporating discrete, noisy, incomplete dissimilarity data into statistical/machine learning models for supervised, semisupervised or unsupervised machine learning. The two methods are RKE (Regularized Kernel Estimation), and RMU (Regularized Manifold Unfolding). Briefly put, the methods use dissimilarity information between objects in a ...
متن کاملSome Research Problems in Metric Learning and Manifold Learning
In the past few years, metric learning, semi-supervised learning, and manifold learning methods have aroused a great deal of interest in the machine learning community. Many machine learning and pattern recognition algorithms rely on a distance metric. Instead of choosing the metric manually, a promising approach is to learn the metric from data automatically. Besides some early work on metric ...
متن کاملMultiscale Dictionary Learning: Non-Asymptotic Bounds and Robustness
High-dimensional datasets are well-approximated by low-dimensional structures. Over the past decade, this empirical observation motivated the investigation of detection, measurement, and modeling techniques to exploit these low-dimensional intrinsic structures, yielding numerous implications for high-dimensional statistics, machine learning, and signal processing. Manifold learning (where the l...
متن کاملTensor Balancing on Statistical Manifold
We solve tensor balancing, rescaling an N th order nonnegative tensor by multiplying N tensors of order N −1 so that every fiber sums to one. This generalizes a fundamental process of matrix balancing used to compare matrices in a wide range of applications from biology to economics. We present an efficient balancing algorithm with quadratic convergence using Newton’s method and show in numeric...
متن کامل